Ghodke, Sumukh and Timothy Baldwin (2007) An Investigation into the Interaction between Feature Selection and Discretization: Learning How and When to Read Numbers, In Proceedings of the 20th Australian Joint Conference on Artificial Intelligence (AI07), Gold Coast, Australia, pp. 48-57
نویسندگان
چکیده
Pre-processing is an important part of machine learning, and has been shown to significantly improve the performance of classifiers. In this paper, we take a selection of pre-processing methods—focusing specifically on discretization and feature selection—and empirically examine their combined effect on classifier performance. In our experiments, we take 11 standard datasets and a selection of standard machine learning algorithms, namely one-R, ID3, naive Bayes, and IB1, and explore the impact of different forms of preprocessing on each combination of dataset and algorithm. We find that in general the combination of wrapper-based forward selection and naive supervised methods of discretization yield consistently above-baseline results.
منابع مشابه
Baldwin, Timothy and Manuel Paul Anil Kumar Joseph (2009) Restoring Punctuation and Casing in English Text, In Proceedings of the 22nd Australian Joint Conference on Artificial Intelligence (AI09), Melbourne, Australia, pp. 547-556
This paper explores the use of machine learning techniques to restore punctuation and case in English text, as part of which it investigates the co-dependence of case information and punctuation. We achieve an overall F-score of .619 for the task using a variety of lexical and contextual features, and iterative retagging.
متن کاملFeature Manipulation with Genetic Programming
Feature manipulation refers to the process by which the input space of a machine learning task is altered in order to improve the learning quality and performance. Three major aspects of feature manipulation are feature construction, feature ranking and feature selection. This thesis proposes a new filter-based methodology for feature manipulation in classification problems using genetic progra...
متن کاملYe, Patrick and Timothy Baldwin (2008) Towards Automatic Animated Storyboarding, In Proceedings of the 23rd Conference on Artificial Intelligence (AAAI-08), Chicago, USA, pp. 578-583
In this paper, we propose a machine learning-based NLP system for automatically creating animated storyboards using the action descriptions of movie scripts. We focus particularly on the importance of verb semantics when generating graphics commands, and find that semantic role labelling boosts performance and is relatively robust to the effects of
متن کاملAI 2004: Advances in Artificial Intelligence, 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004, Proceedings
It's coming again, the new collection that this site has. To complete your curiosity, we offer the favorite ai 2004 advances in artificial intelligence 17th australian joint conference on artificial intelligence cairns australia december 4 6 2004 proceedings book as the choice today. This is a book that will show you even new to old thing. Forget it; it will be right for you. Well, when you are...
متن کاملKim, Su Nam and Timothy Baldwin (2007) Disambiguating Noun Compounds, In Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI-07), Vancouver, Canada, pp. 901-6
This paper is concerned with the interaction between word sense disambiguation and the interpretation of noun compounds (NCs) in English. We develop techniques for disambiguating word sense specifically in NCs, and then investigate whether word sense information can aid in the semantic relation interpretation of NCs. To disambiguate word sense, we combine the one sense per collocation heuristic...
متن کامل